Empirical Comparison of Boosting
نویسندگان
چکیده
Methods for voting classiication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiiers for artiicial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, aaect classiication error. We provide a bias and variance decomposition of the error to show how diierent methods and variants innuence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves diierently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental diierence. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backktting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backkt. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and signiicant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underrows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only \hard" areas but also outliers and noise.
منابع مشابه
Combining Bagging and Boosting
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...
متن کاملCombining Bagging and Additive Regression
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in t...
متن کاملBoosted Image Classification: An Empirical Study
The rapid pace of research in the fields of machine learning and image comparison has produced powerful new techniques in both areas. At the same time, research has been sparse on applying the best ideas from both fields to image classification and other forms of pattern recognition. This paper combines boosting with stateof-the-art methods in image comparison to carry out a comparative evaluat...
متن کاملEmpirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers
We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods ...
متن کاملAn Empirical Comparison of Pruning Methods for Ensemble Classifiers
Many researchers have shown that ensemble methods such as Boosting and Bagging improve the accuracy of classification. Boosting and Bagging perform well with unstable learning algorithms such as neural networks or decision trees. Pruning decision tree classifiers is intended to make trees simpler and more comprehensible and avoid over-fitting. However it is known that pruning individual classif...
متن کاملBoosting Lazy Decision Trees
This paper explores the problem of how to construct lazy decision tree ensembles. We present and empirically evaluate a relevancebased boosting-style algorithm that builds a lazy decision tree ensemble customized for each test instance. From the experimental results, we conclude that our boosting-style algorithm significantly improves the performance of the base learner. An empirical comparison...
متن کامل